Search Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs

نویسندگان

چکیده مقاله:

In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefore, an approach based on none separating recognition could be useful in such cases. In methods based on the overall shapes for subword recognition after extraction of subword features usually these features are searched in the image dictionary created in the training phase. Therefore, by considering that we are faced with massive amounts of classes, proposing ways to limit the scope of the search are the main challenges in the overall shape methods. Thus, the information of the overall shape usually is used to reduce the scope search in a hierarchical form. In this paper, it is tried to reduce the search space of the subwords severely by using a simple and efficient method.  In training phase, training data is grouped based on the location of the points and signs, in the groups where have more than 10 subwords, to reduce the search space, according to the number of elements in the group, by extracting the simple features of horizontal and vertical profiles clustering takes place. In recognition phase, in the first step, by determining the width to height ratio of the subword (with signs and without signs) and the position code of the points and signs, the search scope is limited to subwords with this position code that are within the range of the ratios mentioned. This range would be accepted if the number of subwords in this phase is less than ten. Otherwise, in the next step, by extracting the simple features of the horizontal and vertical profiles of the subwords, the search space will be limited to a number of the closest clusters to this subword that also satisfies the width-to-height ratio. By using the proposed method of this paper, the search space has fallen to an acceptable level. In this study, a database of 12700 subwords with five Lotus, Zar, Nazanin, Mitra and Yaghut fonts scanned 400 dpi was used. The four Lotus, Zar, Nazanin and Mitra fonts were used in the training phase and in the test phase, Yaghut ​​font is used.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using compatible shape descriptor for lexicon reduction of printed Farsi subwords

This Paper presents a method for lexicon reduction of Printed Farsi subwords based on their holistic shape features. Because of the large number of Persian subwords variously shaped from a simple letter to a complex combination of several connected characters, it is not easy to find a fixed shape descriptor suitable for all subwords. In this paper, we propose to select the descriptor according ...

متن کامل

the search for the self in becketts theatre: waiting for godot and endgame

this thesis is based upon the works of samuel beckett. one of the greatest writers of contemporary literature. here, i have tried to focus on one of the main themes in becketts works: the search for the real "me" or the real self, which is not only a problem to be solved for beckett man but also for each of us. i have tried to show becketts techniques in approaching this unattainable goal, base...

15 صفحه اول

Automatic recognition of printed Farsi texts

-The automatic recognition of printed Farsi (Persian) texts is complicated by several properties of the Farsi script: (a) connectivity of symbols, (b) similarity of groups of symbols, (c) highly variable widths, (d) subword overlap, and (e) line overlap. In this paper, a technique for the automatic recognition of printed Farsi texts is presented and its steps are discussed as follows : (1) digi...

متن کامل

study of cohesive devices in the textbook of english for the students of apsychology by rastegarpour

this study investigates the cohesive devices used in the textbook of english for the students of psychology. the research questions and hypotheses in the present study are based on what frequency and distribution of grammatical and lexical cohesive devices are. then, to answer the questions all grammatical and lexical cohesive devices in reading comprehension passages from 6 units of 21units th...

Script Recognition using Inhomogeneous P2DHMM and Hierarchical Search Space Reduction

P2DHMMs manage the assignment of observations to states quite well, whereas the calculation of probabilities is a great problem. To overcome some of the defects of P2DHMMs we propose the inhomogeneous P2DHMM (IP2DHMM). In contrast to some other approaches it is able to model consistently state duration and observation matrices of fixed height at no additional cost in computational load and numb...

متن کامل

the effect of consciousness raising (c-r) on the reduction of translational errors: a case study

در دوره های آموزش ترجمه استادان بیشتر سعی دارند دانشجویان را با انواع متون آشنا سازند، درحالی که کمتر به خطاهای مکرر آنان در متن ترجمه شده می پردازند. اهمیت تحقیق حاضر مبنی بر ارتکاب مکرر خطاهای ترجمانی حتی بعد از گذراندن دوره های تخصصی ترجمه از سوی دانشجویان است. هدف از آن تاکید بر خطاهای رایج میان دانشجویان مترجمی و کاهش این خطاها با افزایش آگاهی و هوشیاری دانشجویان از بروز آنها است.از آنجا ک...

15 صفحه اول

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 16  شماره 3

صفحات  116- 101

تاریخ انتشار 2019-12

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023